Genetically modified microrganisms that carry out the heterologous production of modified versions of the surfactant protein lv-ranaspumin-1(lv-rsn-1), the modified versions of said surfactant protein, the synthetic genes encoding said surfactant protein, the expression cassettes containing said synthetic genes, and the expression vectors containing said synthetic genes

ABSTRACT

The present invention refers to the heterologous production in microorganisms of modified versions of a predicted isoform of the surfactant protein Lv-ranaspumin-1 (Lv-Rsn-1), whose sequence was inferred from analyzes of the protein extract of the nest foam from the Northeastern Pepper Frog (Leptodactylus vastus). More specifically, it refers to two surfactant proteins that consist of modified versions of the predicted isoform of Lv-Rsn-1; to two synthetic genes each encoding one of these modified versions of the predicted isoform of Lv-Rsn-1; to two expression cassettes each containing one of the synthetic genes encoding one of the modified versions of the predicted isoform of Lv-Rsn-1; to two expression vectors each containing one of the synthetic genes encoding modified versions of the predicted isoform of Lv-Rsn-1; and to two transgenic microorganisms, a bacterium and a yeast, each transformed with one of these synthetic genes and heterologously producing one of the modified versions of the predicted isoform of Lv-Rsn-1. Lv-Rsn-1 has surfactancy, emulsification and dispersancy properties, among others, and its heterologous production allows it to be used in various applications and industrial products, without the need to extract it from the frog nest foam.

LISTING OF BIOLOGICAL SEQUENCES

<110> Petróleo Brasileiro S. A.—Petrobras

<120> GENETICALLY MODIFIED MICROORGANISMS THAT CARRY OUT THE HETEROLOGICAL PRODUCTION OF MODIFIED VERSIONS OF THE SURFACTANT PROTEIN LV-RANASPUMIN-1 (LV-RSN-1), THE MODIFIED VERSIONS OF THIS SURFACTANT PROTEIN, THE SYNTHETIC GENES THAT ENCODE THIS SURFACTANT PROTEIN, THE EXPRESSION CASSETTES CONTAINING THESE SYNTHETIC GENES, AND THE EXPRESSION VECTORS CONTAINING THESE SYNTHETIC GENES

<160>10

<210>1

<211>216

<212> PRT

<213> Leptodactylus vastus

<223> predict sequence for one of the isoforms of the Lv-ranaspumin-1

<400>1

Leu Leu Glu Gly Phe Leu Val Gly Gly Gly Val Pro Gly Pro Gly Thr 5 10 15

Ala Cys Leu Thr Lys Ala Leu Lys Asp Ser Gly Asp Leu Leu Val Glu 20 25 30

Leu Ala Val Ile Ile Cys Ala Tyr Gln Asn Gly Lys Asp Leu Gln Glu

35 40 45

Gln Asp Phe Lys Glu Leu Lys Glu Leu Leu Glu Arg Thr Leu Glu Arg

55 60

Ala Gly Cys Ala Leu Asp Asp Ile Val Ala Asp Leu Gly Leu Glu Glu

70 75 80

Leu Leu Gly Ser Ile Gly Val Ser Thr Gly Asp Ile Ile Gln Gly Leu

85 90 95

Tyr Lys Leu Leu Lys Glu Leu Lys Ile Asp Glu Thr Val Phe Asn Ala

100 105 110

Val Cys Asp Val Thr Lys Lys Met Leu Asp Asn Lys Cys Leu Pro Lys

115 120 125

Ile Leu Gln Gly Asp Leu Val Lys Phe Leu Asp Leu Lys Tyr Lys Val 130 135 140

Cys Ile Glu Gly Gly Asp Pro Glu Leu Ile Ile Lys Asp Leu Lys Ile

145 150 155 160

Ile Leu Glu Arg Leu Pro Cys Val Leu Gly Gly Val Gly Leu Asp Asp

165 170 175

Leu Phe Lys Asn Ile Phe Val Lys Asp Gly Ile Leu Ser Phe Glu Gly

180 185 190

Ile Ala Lys Pro Leu Gly Asp Leu Leu Ile Leu Val Leu Cys Pro Asn

195 200 205

Val Lys Asn Ile Asn Val Ser Ser

210 215

<210>2

<211>648

<212> DNA

<213> Artificial Sequence

<220>

<221> CDS

<222> (1) . . . (648)

<223> encoding sequence of one of the isoforms of Lv-ranaspumin-1 after reverse translation of the predicted amino acid sequence

<400>2

ctg ctg gaa ggc ttt ctg gtg ggc ggc ggc gtg ccg ggc ccg ggc acc 48

Leu Leu Glu Gly Phe Leu Val Gly Gly Gly Val Pro Gly Pro Gly Thr

5 10 15

gcg tgc ctg acc aaa gcg ctg aaa gat agc ggc gat ctg ctg gtg gaa

96

Ala Cys Leu Thr Lys Ala Leu Lys Asp Ser Gly Asp Leu Leu Val Glu

20 25 30

ctg gcg gtg att att tgc gcg tat cag aac ggc aaa gat ctg cag gaa

144

Leu Ala Val Ile Ile Cys Ala Tyr Gln Asn Gly Lys Asp Leu Gln Glu

35 40 45

cag gat ttt aaa gaa ctg aaa gaa ctg ctg gaa cgc acc ctg gaa cgc

192

Gln Asp Phe Lys Glu Leu Lys Glu Leu Leu Glu Arg Thr Leu Glu Arg

50 55 60

gcg ggc tgc gcg ctg gat gat att gtg gcg gat ctg ggc ctg gaa gaa

240

Ala Gly Cys Ala Leu Asp Asp Ile Val Ala Asp Leu Gly Leu Glu Glu

65 70 75 80

ctg ctg ggc agc att ggc gtg agc acc ggc gat att att cag ggc ctg 288

Leu Leu Gly Ser Ile Gly Val Ser Thr Gly Asp Ile Ile Gln Gly Leu

85 90 95

tat aaa ctg ctg aaa gaa ctg aaa att gat gaa acc gtg ttt aac gcg

336

Tyr Lys Leu Leu Lys Glu Leu Lys Ile Asp Glu Thr Val Phe Asn Ala

100 105 110

gtg tgc gat gtg acc aaa aaa atg ctg gat aac aaa tgc ctg ccg aaa

384

Val Cys Asp Val Thr Lys Lys Met Leu Asp Asn Lys Cys Leu Pro Lys

115 120 125

att ctg cag ggc gat ctg gtg aaa ttt ctg gat ctg aaa tat aaa gtg

432

Ile Leu Gln Gly Asp Leu Val Lys Phe Leu Asp Leu Lys Tyr Lys Val

130 135 140

tgc att gaa ggc ggc gat ccg gaa ctg att att aaa gat ctg aaa att

480

Cys Ile Glu Gly Gly Asp Pro Glu Leu Ile Ile Lys Asp Leu Lys Ile

145 150 155 160

att ctg gaa cgc ctg ccg tgc gtg ctg ggc ggc gtg ggc ctg gat gat 528

Ile Leu Glu Arg Leu Pro Cys Val Leu Gly Gly Val Gly Leu Asp Asp

165 170 175

ctg ttt aaa aac att ttt gtg aaa gat ggc att ctg agc ttt gaa ggc

576

Leu Phe Lys Asn Ile Phe Val Lys Asp Gly Ile Leu Ser Phe Glu Gly

180 185 190

att gcg aaa ccg ctg ggc gat ctg ctg att ctg gtg ctg tgc ccg aac 624

Ile Ala Lys Pro Leu Gly Asp Leu Leu Ile Leu Val Leu Cys Pro Asn

195 200 205

gtg aaa aac att aac gtg agc agc

648

Val Lys Asn Ile Asn Val Ser Ser

210 215

<210>3

<211>651

<212> DNA

<213> Artificial Sequence

<220>

<221> CDS

<222> (1) . . . (651)

<223> codon frequency optimization of SEQ ID NO:2 for expression in bacteria and addition of the ATG start codon

<400>3

atg ctg ctg gaa ggt ttt ctg gtt ggg ggc ggt gtt ccg ggt cca ggc

48

Met Leu Leu Glu Gly Phe Leu Val Gly Gly Gly Val Pro Gly Pro Gly

5 10 15

acg gcc tgc ttg acg aag get ctg aaa gat agc ggt gac ctg ctg gtg

96

Thr Ala Cys Leu Thr Lys Ala Leu Lys Asp Ser Gly Asp Leu Leu Val

20 25 30

gag tta gcg gtt att att tgt gca tac cag aat ggc aaa gac ctt cag

144

Glu Leu Ala Val Ile Ile Cys Ala Tyr Gln Asn Gly Lys Asp Leu Gln

35 40 45

gag cag gac ttc aaa gaa ctg aag gaa ttg ctg gaa cgt aca ttg gaa

192

Glu Gln Asp Phe Lys Glu Leu Lys Glu Leu Leu Glu Arg Thr Leu Glu

55 60

cgt gcc ggt tgt gcc ctc gat gat att gtg gcc gat tta ggt ctg gaa

240

Arg Ala Gly Cys Ala Leu Asp Asp Ile Val Ala Asp Leu Gly Leu Glu

70 75 80

gaa ctg ctg ggc tcc atc ggc gtt agt acc ggc gat att atc cag ggt

288

Glu Leu Leu Gly Ser Ile Gly Val Ser Thr Gly Asp Ile Ile Gln Gly

85 90 95

ctg tat aaa ctg ttg aag gag tta aaa atc gac gag acc gtc ttt aat

336

Leu Tyr Lys Leu Leu Lys Glu Leu Lys Ile Asp Glu Thr Val Phe Asn

100 105 110

gcg gtc tgc gat gtg acc aaa aaa atg ctg gat aac aag tgc tta ccg

384

Ala Val Cys Asp Val Thr Lys Lys Met Leu Asp Asn Lys Cys Leu Pro

115 120 125

aaa att ctg caa gga gat ctg gta aag ttc ctt gat ctg aag tat aaa

432

Lys Ile Leu Gln Gly Asp Leu Val Lys Phe Leu Asp Leu Lys Tyr Lys

130 135 140

gtt tgt att gaa ggt ggc gat cca gaa ctg att att aag gat ctg aaa

480

Val Cys Ile Glu Gly Gly Asp Pro Glu Leu Ile Ile Lys Asp Leu Lys

145 150 155 160

atc atc ctg gaa cgg ctt ccg tgt gtg ttg ggt gga gtc ggt ttg gat

528

Ile Ile Leu Glu Arg Leu Pro Cys Val Leu Gly Gly Val Gly Leu Asp

165 170 175

gat ctc ttt aag aac att ttt gtt aag gat ggg att ctg tcc ttc gaa

576

Asp Leu Phe Lys Asn Ile Phe Val Lys Asp Gly Ile Leu Ser Phe Glu

180 185 190

ggt att gcg aaa cct ctt ggt gac ctt ctc atc ctt gtc tta tgc ccg

624

Gly Ile Ala Lys Pro Leu Gly Asp Leu Leu Ile Leu Val Leu Cys Pro

195 200 205

aac gtc aag aat atc aat gta tcc tct

651

Asn Val Lys Asn Ile Asn Val Ser Ser

210 215

<210>4

<211>697

<212> DNA

<213> Artificial Sequence

<220> CDS

<223> the SEQ ID NO:3 including the restriction site for Ndel,

the encoding sequence of the polyhistidine tag, the restriction site for EcoRI, the sequence encoding the cleavage site for TEV, the restriction site for Ndel, and the restriction site for Xhol

<400>4

g aat tct gaa aac ttg tat ttc cag ggc agc cat atg atg ctg ctg

46

Asn Ser Glu Asn Leu Tyr Phe Gln Gly Ser His Met Met Leu Leu

5 10 15

gaa ggt ttt ctg gtt ggg ggc ggt gtt ccg ggt cca ggc acg gcc tgc

94

Glu Gly Phe Leu Val Gly Gly Gly Val Pro Gly Pro Gly Thr Ala Cys

20 25 30

ttg acg aag get ctg aaa gat agc ggt gac ctg ctg gtg gag tta gcg

142

Leu Thr Lys Ala Leu Lys Asp Ser Gly Asp Leu Leu Val Glu Leu Ala

35 40 45

gtt att att tgt gca tac cag aat ggc aaa gac ctt cag gag cag gac

190

Val Ile Ile Cys Ala Tyr Gln Asn Gly Lys Asp Leu Gln Glu Gln Asp

50 55 60

ttc aaa gaa ctg aag gaa ttg ctg gaa cgt aca ttg gaa cgt gcc ggt

238

Phe Lys Glu Leu Lys Glu Leu Leu Glu Arg Thr Leu Glu Arg Ala Gly

70 75

tgt gcc ctc gat gat att gtg gcc gat tta ggt ctg gaa gaa ctg ctg

286

Cys Ala Leu Asp Asp Ile Val Ala Asp Leu Gly Leu Glu Glu Leu Leu

85 90 95

ggc tcc atc ggc gtt agt acc ggc gat att atc cag ggt ctg tat aaa

334

Gly Ser Ile Gly Val Ser Thr Gly Asp Ile Ile Gln Gly Leu Tyr Lys

100 105 110

ctg ttg aag gag tta aaa atc gac gag acc gtc ttt aat gcg gtc tgc

382

Leu Leu Lys Glu Leu Lys Ile Asp Glu Thr Val Phe Asn Ala Val Cys

115 120 125

gat gtg acc aaa aaa atg ctg gat aac aag tgc tta ccg aaa att ctg

430

Asp Val Thr Lys Lys Met Leu Asp Asn Lys Cys Leu Pro Lys Ile Leu

130 135 140

caa gga gat ctg gta aag ttc ctt gat ctg aag tat aaa gtt tgt att

478

Gln Gly Asp Leu Val Lys Phe Leu Asp Leu Lys Tyr Lys Val Cys Ile

145 150 155

gaa ggt ggc gat cca gaa ctg att att aag gat ctg aaa atc atc ctg

526

Glu Gly Gly Asp Pro Glu Leu Ile Ile Lys Asp Leu Lys Ile Ile Leu

160 165 170 175

gaa cgg ctt ccg tgt gtg ttg ggt gga gtc ggt ttg gat gat ctc ttt

574

Glu Arg Leu Pro Cys Val Leu Gly Gly Val Gly Leu Asp Asp Leu Phe

180 185 190

aag aac att ttt gtt aag gat ggg att ctg tcc ttc gaa ggt att gcg

622

Lys Asn Ile Phe Val Lys Asp Gly Ile Leu Ser Phe Glu Gly Ile Ala

195 200 205

aaa cct ctt ggt gac ctt ctc atc ctt gtc tta tgc ccg aac gtc aag

670

Lys Pro Leu Gly Asp Leu Leu Ile Leu Val Leu Cys Pro Asn Val Lys

210 215 220

aat atc aat gta tcc tct taactcgag

697

Asn Ile Asn Val Ser Ser

225

<210>5

<211>6395

<212> DNA

<213> Artificial Sequence

<220>

<223> pPBUFCBac-LvRsnl expression vector resulting from the insertion of SEQ ID NO:4 into SEQ ID NO:5

<400>5

gatctcgatc ccgcgaaatt aatacgactc actatagggg aattgtgagc ggataacaat 60

tcccctctag aaataatttt gtttaaactt taagaaggag atatacatat g cat cat 117

His His

cat cat cat cac gtg aat tct gaa aac ttg tat ttc cag ggc agc cat 165

His His His His Val Asn Ser Glu Asn Leu Tyr Phe Gln Gly Ser His

5 10 15

atg atg ctg ctg gaa ggt ttt ctg gtt ggg ggc ggt gtt ccg ggt cca 213

Met Met Leu Leu Glu Gly Phe Leu Val Gly Gly Gly Val Pro Gly Pro

25 30

ggc acg gcc tgc ttg acg aag gct ctg aaa gat agc ggt gac ctg ctg 261

Gly Thr Ala Cys Leu Thr Lys Ala Leu Lys Asp Ser Gly Asp Leu Leu

40 45 50

gtg gag tta gcg gtt att att tgt gca tac cag aat ggc aaa gac ctt 309

Val Glu Leu Ala Val Ile Ile Cys Ala Tyr Gln Asn Gly Lys Asp Leu

55 60 65

cag gag cag gac ttc aaa gaa ctg aag gaa ttg ctg gaa cgt aca ttg 357

Gln Glu Gln Asp Phe Lys Glu Leu Lys Glu Leu Leu Glu Arg Thr Leu

70 75 80

gaa cgt gcc ggt tgt gcc ctc gat gat att gtg gcc gat tta ggt ctg 405

Glu Arg Ala Gly Cys Ala Leu Asp Asp Ile Val Ala Asp Leu Gly Leu 85 90 95

gaa gaa ctg ctg ggc tcc atc ggc gtt agt acc ggc gat att atc cag 453

Glu Glu Leu Leu Gly Ser Ile Gly Val Ser Thr Gly Asp Ile Ile Gln

100 105 110

ggt ctg tat aaa ctg ttg aag gag tta aaa atc gac gag acc gtc ttt 501

Gly Leu Tyr Lys Leu Leu Lys Glu Leu Lys Ile Asp Glu Thr Val Phe

115 120 125 130

aat gcg gtc tgc gat gtg acc aaa aaa atg ctg gat aac aag tgc tta 549

Asn Ala Val Cys Asp Val Thr Lys Lys Met Leu Asp Asn Lys Cys Leu

135 140 145

ccg aaa att ctg caa gga gat ctg gta aag ttc ctt gat ctg aag tat 597

Pro Lys Ile Leu Gln Gly Asp Leu Val Lys Phe Leu Asp Leu Lys Tyr

150 155 160

aaa gtt tgt att gaa ggt ggc gat cca gaa ctg att att aag gat ctg 645

Lys Val Cys Ile Glu Gly Gly Asp Pro Glu Leu Ile Ile Lys Asp Leu

165 170 175

aaa atc atc ctg gaa cgg ctt ccg tgt gtg ttg ggt gga gtc ggt ttg 693

Lys Ile Ile Leu Glu Arg Leu Pro Cys Val Leu Gly Gly Val Gly Leu

180 185 190

gat gat ctc ttt aag aac att ttt gtt aag gat ggg att ctg tcc ttc 741

Asp Asp Leu Phe Lys Asn Ile Phe Val Lys Asp Gly Ile Leu Ser Phe

195 200 205 210

gaa ggt att gcg aaa cct ctt ggt gac ctt ctc atc ctt gtc tta tgc 789

Glu Gly Ile Ala Lys Pro Leu Gly Asp Leu Leu Ile Leu Val Leu Cys

215 220 225

ccg aac gtc aag aat atc aat gta tcc tct taactcgaga tcgatgatat 819

Pro Asn Val Lys Asn Ile Asn Val Ser Ser

230 235

tcgagcctag gtataatcgg atccggctgc taacaaagcc cgaaaggaag ctgagttggc 879

tgctgccacc gctgagcaat aactagcata accccttggg gcctctaaac gggtcttgag 939

gggttttttg ctgaaaggag gaactatatc cggatatccc gcaagaggcc cggcagtacc 999

ggcataacca agcctatgcc tacagcatcc agggtgacgg tgccgaggat gacgatgagc 1059

gcattgttag atttcataca cggtgcctga ctgcgttagc aatttaactg tgataaacta 1119

ccgcattaaa gctagcttat cgatgataag ctgtcaaaca tgagaattaa ttcttgaaga 1179

cgaaagggcc tcgtgatacg cctattttta taggttaatg tcatgataat aatggtttct 1239

tagacgtcag gtggcacttt tcggggaaat gtgcgcggaa cccctatttg tttatttttc 1299

taaatacatt caaatatgta tccgctcatg agacaataac cctgataaat gcttcaataa 1359

tattgaaaaa ggaagagtat gagtattcaa catttccgtg tcgcccttat tccctttttt 1419

gcggcatttt gccttcctgt ttttgctcac ccagaaacgc tggtgaaagt aaaagatgct 1479

gaagatcagt tgggtgcacg agtgggttac atcgaactgg atctcaacag cggtaagatc 1539

cttgagagtt ttcgccccga agaacgtttt ccaatgatga gcacttttaa agttctgcta 1599

tgtggcgcgg tattatcccg tgttgacgcc gggcaagagc aactcggtcg ccgcatacac 1659

tattctcaga atgacttggt tgagtactca ccagtcacag aaaagcatct tacggatggc 1719

atgacagtaa gagaattatg cagtgctgcc ataaccatga gtgataacac tgcggccaac 1779

ttacttctga caacgatcgg aggaccgaag gagctaaccg cttttttgca caacatgggg 1839

gatcatgtaa ctcgccttga tcgttgggaa ccggagctga atgaagccat accaaacgac 1899

gagcgtgaca ccacgatgcc tgcagcaatg gcaacaacgt tgcgcaaact attaactggc 1959

gaactactta ctctagcttc ccggcaacaa ttaatagact ggatggaggc ggataaagtt 2019

gcaggaccac ttctgcgctc ggcccttccg gctggctggt ttattgctga taaatctgga 2079

gccggtgagc gtgggtctcg cggtatcatt gcagcactgg ggccagatgg taagccctcc 2139

cgtatcgtag ttatctacac gacggggagt caggcaacta tggatgaacg aaatagacag 2199

atcgctgaga taggtgcctc actgattaag cattggtaac tgtcagacca agtttactca 2259

tatatacttt agattgattt aaaacttcat ttttaattta aaaggatcta ggtgaagatc 2319

ctttttgata atctcatgac caaaatccct taacgtgagt tttcgttcca ctgagcgtca 2379

gaccccgtag aaaagatcaa aggatcttct tgagatcctt tttttctgcg cgtaatctgc 2439

tgcttgcaaa caaaaaaacc accgctacca gcggtggttt gtttgccgga tcaagagcta 2499

ccaactcttt ttccgaaggt aactggcttc agcagagcgc agataccaaa tactgtcctt 2559

ctagtgtagc cgtagttagg ccaccacttc aagaactctg tagcaccgcc tacatacctc 2619

gctctgctaa tcctgttacc agtggctgct gccagtggcg ataagtcgtg tcttaccggg 2679

ttggactcaa gacgatagtt accggataag gcgcagcggt cgggctgaac ggggggttcg 2739

tgcacacagc ccagcttgga gcgaacgacc tacaccgaac tgagatacct acagcgtgag 2799

ctatgagaaa gcgccacgct tcccgaaggg agaaaggcgg acaggtatcc ggtaagcggc 2859

agggtcggaa caggagagcg cacgagggag cttccagggg gaaacgcctg gtatctttat 2919

agtcctgtcg ggtttcgcca cctctgactt gagcgtcgat ttttgtgatg ctcgtcaggg 2979

gggcggagcc tatggaaaaa cgccagcaac gcggcctttt tacggttcct ggccttttgc 3039

tggccttttg ctcacatgtt ctttcctgcg ttatcccctg attctgtgga taaccgtatt 3099

accgcctttg agtgagctga taccgctcgc cgcagccgaa cgaccgagcg cagcgagtca 3159

gtgagcgagg aagcggaaga gcgcctgatg cggtattttc tccttacgca tctgtgcggt 3219

atttcacacc gcaatggtgc actctcagta caatctgctc tgatgccgca tagttaagcc 3279

agtatacact ccgctatcgc tacgtgactg ggtcatggct gcgccccgac acccgccaac 3339

acccgctgac gcgccctgac gggcttgtct gctcccggca tccgcttaca gacaagctgt 3399

gaccgtctcc gggagctgca tgtgtcagag gttttcaccg tcatcaccga aacgcgcgag 3459

gcagctgcgg taaagctcat cagcgtggtc gtgaagcgat tcacagatgt ctgcctgttc 3519

atccgcgtcc agctcgttga gtttctccag aagcgttaat gtctggcttc tgataaagcg 3579

ggccatgtta agggcggttt tttcctgttt ggtcactgat gcctccgtgt aagggggatt 3639

tctgttcatg ggggtaatga taccgatgaa acgagagagg atgctcacga tacgggttac 3699

tgatgatgaa catgcccggt tactggaacg ttgtgagggt aaacaactgg cggtatggat 3759

gcggcgggac cagagaaaaa tcactcaggg tcaatgccag cgcttcgtta atacagatgt 3819

aggtgttcca cagggtagcc agcagcatcc tgcgatgcag atccggaaca taatggtgca 3879

gggcgctgac ttccgcgttt ccagacttta cgaaacacgg aaaccgaaga ccattcatgt 3939

tgttgctcag gtcgcagacg ttttgcagca gcagtcgctt cacgttcgct cgcgtatcgg 3999

tgattcattc tgctaaccag taaggcaacc ccgccagcct agccgggtcc tcaacgacag 4059

gagcacgatc atgcgcaccc gtggccagga cccaacgctg cccgagatgc gccgcgtgcg 4119

gctgctggag atggcggacg cgatggatat gttctgccaa gggttggttt gcgcattcac 4179

agttctccgc aagaattgat tggctccaat tcttggagtg gtgaatccgt tagcgaggtg 4239

ccgccggctt ccattcaggt cgaggtggcc cggctccatg caccgcgacg caacgcgggg 4299

aggcagacaa ggtatagggc ggcgcctaca atccatgcca acccgttcca tgtgctcgcc 4359

gaggcggcat aaatcgccgt gacgatcagc ggtccaatga tcgaagttag gctggtaaga 4419

gccgcgagcg atccttgaag ctgtccctga tggtcgtcat ctacctgcct ggacagcatg 4479

gcctgcaacg cgggcatccc gatgccgccg gaagcgagaa gaatcataat ggggaaggcc 4539

atccagcctc gcgtcgcgaa cgccagcaag acgtagccca gcgcgtcggc cgccatgccg 4599

gcgataatgg cctgcttctc gccgaaacgt ttggtggcgg gaccagtgac gaaggcttga 4659

gcgagggcgt gcaagattcc gaataccgca agcgacaggc cgatcatcgt cgcgctccag 4719

cgaaagcggt cctcgccgaa aatgacccag agcgctgccg gcacctgtcc tacgagttgc 4779

atgataaaga agacagtcat aagtgcggcg acgatagtca tgccccgcgc ccaccggaag 4839

gagctgactg ggttgaaggc tctcaagggc atcggtcgag atcccggtgc ctaatgagtg 4899

agctaactta cattaattgc gttgcgctca ctgcccgctt tccagtcggg aaacctgtcg 4959

tgccagctgc attaatgaat cggccaacgc gcggggagag gcggtttgcg tattgggcgc 5019

cagggtggtt tttcttttca ccagtgagac gggcaacagc tgattgccct tcaccgcctg 5079

gccctgagag agttgcagca agcggtccac gctggtttgc cccagcaggc gaaaatcctg 5139

tttgatggtg gttaacggcg ggatataaca tgagctgtct tcggtatcgt cgtatcccac 5199

taccgagata tccgcaccaa cgcgcagccc ggactcggta atggcgcgca ttgcgcccag 5259

cgccatctga tcgttggcaa ccagcatcgc agtgggaacg atgccctcat tcagcatttg 5319

catggtttgt tgaaaccgga catggcactc cagtcgcctt cccgttccgc tatcggctga 5379

atttgattgc gagtgagata tttatgccag ccagccagac gcagacgcgc cgagacagaa 5439

cttaatgggc ccgctaacag cgcgatttgc tggtgaccca atgcgaccag atgctccacg 5499

cccagtcgcg taccgtcttc atgggagaaa ataatactgt tgatgggtgt ctggtcagag 5559

acatcaagaa ataacgccgg aacattagtg caggcagctt ccacagcaat ggcatcctgg 5619

tcatccagcg gatagttaat gatcagccca ctgacgcgtt gcgcgagaag attgtgcacc 5679

gccgctttac aggcttcgac gccgcttcgt tctaccatcg acaccaccac gctggcaccc 5739

agttgatcgg cgcgagattt aatcgccgcg acaatttgcg acggcgcgtg cagggccaga 5799

ctggaggtgg caacgccaat cagcaacgac tgtttgcccg ccagttgttg tgccacgcgg 5859

ttgggaatgt aattcagctc cgccatcgcc gcttccactt tttcccgcgt tttcgcagaa 5919

acgtggctgg cctggttcac cacgcgggaa acggtctgat aagagacacc ggcatactct 5979

gcgacatcgt ataacgttac tggtttcaca ttcaccaccc tgaattgact ctcttccggg 6039

cgctatcatg ccataccgcg aaaggttttg cgccattcga tggtgtccgg gatctcgacg 6099

ctctccctta tgcgactcct gcattaggaa gcagcccagt agtaggttga ggccgttgag 6159

caccgccgcc gcaaggaatg gtgcatgcaa ggagatggcg cccaacagtc ccccggccac 6219

ggggcctgcc accataccca cgccgaaaca agcgctcatg agcccgaagt ggcgagcccg 6279

atcttcccca tcggtgatgt cggcgatata ggcgccagca accgcacctg tggcgccggt 6339

gatgccggcc acgatgcgtc cggcgtagag gatcga 6375

<210>6

<211>236

<212> PRT

<213> Artificial Sequence

<220>

<223> amino acid sequence of the modified version of the Lv-Rsn-1 surfactant protein encoded by the nucleotide sequence SEQ ID NO:4, which comprises the SEQ ID NO:3

<400>6

His His His His His His Val Asn Ser Glu Asn Leu Tyr Phe Gln Gly

5 10 15

Ser His Met Met Leu Leu Glu Gly Phe Leu Val Gly Gly Gly Val Pro

20 25 30

Gly Pro Gly Thr Ala Cys Leu Thr Lys Ala Leu Lys Asp Ser Gly Asp

35 40 45

Leu Leu Val Glu Leu Ala Val Ile Ile Cys Ala Tyr Gln Asn Gly Lys

55 60

Asp Leu Gln Glu Gln Asp Phe Lys Glu Leu Lys Glu Leu Leu Glu Arg

70 75 80

Thr Leu Glu Arg Ala Gly Cys Ala Leu Asp Asp Ile Val Ala Asp Leu

85 90 95

Gly Leu Glu Glu Leu Leu Gly Ser Ile Gly Val Ser Thr Gly Asp Ile

100 105 110

Ile Gln Gly Leu Tyr Lys Leu Leu Lys Glu Leu Lys Ile Asp Glu Thr

115 120 125

Val Phe Asn Ala Val Cys Asp Val Thr Lys Lys Met Leu Asp Asn Lys

130 135 140

Cys Leu Pro Lys Ile Leu Gln Gly Asp Leu Val Lys Phe Leu Asp Leu

145 150 155 160

Lys Tyr Lys Val Cys Ile Glu Gly Gly Asp Pro Glu Leu Ile Ile Lys

165 170 175

Asp Leu Lys Ile Ile Leu Glu Arg Leu Pro Cys Val Leu Gly Gly Val

180 185 190

Gly Leu Asp Asp Leu Phe Lys Asn Ile Phe Val Lys Asp Gly Ile Leu

195 200 205

Ser Phe Glu Gly Ile Ala Lys Pro Leu Gly Asp Leu Leu Ile Leu Val

210 215 220

Leu Cys Pro Asn Val Lys Asn Ile Asn Val Ser Ser

225 230 235

<210>7

<211>648

<212> DNA

<213> Artificial Sequence

<220>

<221> CDS

<222> (1) . . . (648)

<223> codon frequency optimization of the SEQ ID NO:2 for expression in yeasts

<400>7

ttg ttg gaa gga ttt ttg gtc gga ggt ggt gtc cct ggt cct ggt aca

48

Leu Leu Glu Gly Phe Leu Val Gly Gly Gly Val Pro Gly Pro Gly Thr

5 10 15

gca tgt ttg act aag gca ttg aaa gac agt gga gac ttg ttg gtt gag

96

Ala Cys Leu Thr Lys Ala Leu Lys Asp Ser Gly Asp Leu Leu Val Glu

20 25 30

ttg gct gtt att att tgt gct tac caa aac ggt aaa gat ttg caa gag

144

Leu Ala Val Ile Ile Cys Ala Tyr Gln Asn Gly Lys Asp Leu Gln Glu

35 40 45

caa gat ttc aag gaa ttg aag gag ttg ttg gaa aga act ttg gaa aga

192

Gln Asp Phe Lys Glu Leu Lys Glu Leu Leu Glu Arg Thr Leu Glu Arg

55 60

gct ggt tgt gct ttg gat gat att gtt gct gat ttg ggt ttg gaa gag

240

Ala Gly Cys Ala Leu Asp Asp Ile Val Ala Asp Leu Gly Leu Glu Glu

70 75 80

ttg ttg ggt tct att ggt gtt tct act gga gat atc atc caa ggt ttg

288

Leu Leu Gly Ser Ile Gly Val Ser Thr Gly Asp Ile Ile Gln Gly Leu

85 90 95

tac aag ttg ttg aag gag ttg aag atc gat gaa act gtt ttt aac get

336

Tyr Lys Leu Leu Lys Glu Leu Lys Ile Asp Glu Thr Val Phe Asn Ala

100 105 110

gtt tgt gat gtt act aag aaa atg ttg gat aac aag tgt ttg cca aag

384

Val Cys Asp Val Thr Lys Lys Met Leu Asp Asn Lys Cys Leu Pro Lys

115 120 125

atc ttg caa gga gat ttg gtt aag ttc ttg gat ttg aag tac aag gtt

432

Ile Leu Gln Gly Asp Leu Val Lys Phe Leu Asp Leu Lys Tyr Lys Val

130 135 140

tgt atc gaa ggt gga gat cca gaa ttg att att aag gat ttg aag atc

480

Cys Ile Glu Gly Gly Asp Pro Glu Leu Ile Ile Lys Asp Leu Lys Ile

145 150 155 160

atc ttg gag aga ttg cct tgt gtt ttg ggt ggt gtt ggt ttg gat gat

528

Ile Leu Glu Arg Leu Pro Cys Val Leu Gly Gly Val Gly Leu Asp Asp

165 170 175

ttg ttt aaa aac atc ttc gtt aag gat ggt att ttg tct ttc gaa ggt

576

Leu Phe Lys Asn Ile Phe Val Lys Asp Gly Ile Leu Ser Phe Glu Gly

180 185 190

att get aag cct ttg gga gat ttg ttg att ttg gtt ttg tgt cct aat

624

Ile Ala Lys Pro Leu Gly Asp Leu Leu Ile Leu Val Leu Cys Pro Asn

195 200 205

gtc aag aat atc aat gtt tca tca

648

Val Lys Asn Ile Asn Val Ser Ser

210 215

<210>8

<211>685

<212> DNA

<213> Artificial Sequence

<220>

<221> CDS

<222> (1) . . . (648)

<223> the SEQ ID NO:8 after addition of the restriction site for

the Pstl endonuclease, of two nucleotides to place the encoding sequence in the same frame of translation as the secretion factor alpha, and of the restriction site for the endonuclease Notl

<400>8

ct gca gga ttg ttg gaa gga ttt ttg gtc gga ggt ggt gtc cct ggt

47

Ala Gly Leu Leu Glu Gly Phe Leu Val Gly Gly Gly Val Pro Gly

5 10 15

cct ggt aca gca tgt ttg act aag gca ttg aaa gac agt gga gac ttg

95

Pro Gly Thr Ala Cys Leu Thr Lys Ala Leu Lys Asp Ser Gly Asp Leu

20 25 30

ttg gtt gag ttg get gtt att att tgt get tac caa aac ggt aaa gat

143

Leu Val Glu Leu Ala Val Ile Ile Cys Ala Tyr Gln Asn Gly Lys Asp

35 40 45

ttg caa gag caa gat ttc aag gaa ttg aag gag ttg ttg gaa aga act

191

Leu Gln Glu Gln Asp Phe Lys Glu Leu Lys Glu Leu Leu Glu Arg Thr

50 55 60

ttg gaa aga gct ggt tgt gct ttg gat gat att gtt gct gat ttg ggt

239

Leu Glu Arg Ala Gly Cys Ala Leu Asp Asp Ile Val Ala Asp Leu Gly

70 75

ttg gaa gag ttg ttg ggt tct att ggt gtt tct act gga gat atc atc

287

Leu Glu Glu Leu Leu Gly Ser Ile Gly Val Ser Thr Gly Asp Ile Ile

85 90 95

caa ggt ttg tac aag ttg ttg aag gag ttg aag atc gat gaa act gtt

335

Gln Gly Leu Tyr Lys Leu Leu Lys Glu Leu Lys Ile Asp Glu Thr Val

100 105 110

ttt aac gct gtt tgt gat gtt act aag aaa atg ttg gat aac aag tgt

383

Phe Asn Ala Val Cys Asp Val Thr Lys Lys Met Leu Asp Asn Lys Cys

115 120 125

ttg cca aag atc ttg caa gga gat ttg gtt aag ttc ttg gat ttg aag

431

Leu Pro Lys Ile Leu Gln Gly Asp Leu Val Lys Phe Leu Asp Leu Lys

130 135 140

tac aag gtt tgt atc gaa ggt gga gat cca gaa ttg att att aag gat

479

Tyr Lys Val Cys Ile Glu Gly Gly Asp Pro Glu Leu Ile Ile Lys Asp

145 150 155

ttg aag atc atc ttg gag aga ttg cct tgt gtt ttg ggt ggt gtt ggt

527

Leu Lys Ile Ile Leu Glu Arg Leu Pro Cys Val Leu Gly Gly Val Gly

160 165 170 175

ttg gat gat ttg ttt aaa aac atc ttc gtt aag gat ggt att ttg tct

575

Leu Asp Asp Leu Phe Lys Asn Ile Phe Val Lys Asp Gly Ile Leu Ser

180 185 190

ttc gaa ggt att get aag cct ttg gga gat ttg ttg att ttg gtt ttg

623

Phe Glu Gly Ile Ala Lys Pro Leu Gly Asp Leu Leu Ile Leu Val Leu

195 200 205

tgt cct aat gtc aag aat atc aat gtt tca tca gag aac ctt tac ttt

671

Cys Pro Asn Val Lys Asn Ile Asn Val Ser Ser Glu Asn Leu Tyr Phe

210 215 220

cag gga gcg gcc gc

685

Gln Gly Ala Ala

225

<210>9

<211>4219

<212> DNA

<213> Artificial Sequence

<220>

<223> pPBUFCYea-LvRsnl expression vector resulting from the insertion of the SEQ ID NO:9 into the SEQ ID NO:10

<400>9

agatctaaca tccaaagacg aaaggttgaa tgaaaccttt ttgccatccg acatccacag 60

gtccattctc acacataagt gccaaacgca acaggagggg atacactagc agcagaccgt 120

tgcaaacgca ggacctccac tcctcttctc ctcaacaccc acttttgcca tcgaaaaacc 180

agcccagtta ttgggcttga ttggagctcg ctcattccaa ttccttctat taggctacta 240

acaccatgac tttattagcc tgtctatcct ggcccccctg gcgaggttca tgtttgttta 300

tttccgaatg caacaagctc cgcattacac ccgaacatca ctccagatga gggctttctg 360

agtgtggggt caaatagttt catgttcccc aaatggccca aaactgacag tttaaacgct 420

gtcttggaac ctaatatgac aaaagcgtga tctcatccaa gatgaactaa gtttggttcg 480

ttgaaatgct aacggccagt tggtcaaaaa gaaacttcca aaagtcggca taccgtttgt 540

cttgtttggt attgattgac gaatgctcaa aaataatctc attaatgctt agcgcagtct 600

ctctatcgct tctgaacccc ggtgcacctg tgccgaaacg caaatgggga aacacccgct 660

ttttggatga ttatgcattg tctccacatt gtatgcttcc aagattctgg tgggaatact 720

gctgatagcc taacgttcat gatcaaaatt taactgttct aacccctact tgacagcaat 780

atataaacag aaggaagctg ccctgtctta aacctttttt tttatcatca ttattagctt 840

actttcataa ttgcgactgg ttccaattga caagcttttg attttaacga cttttaacga 900

caacttgaga agatcaaaaa acaactaatt attcgaaacg atg aga ttt cct tca 955

Met Arg Phe Pro Ser

1 5

att ttt act gct gtt tta ttc gca gca tcc tcc gca tta gct gct cca 1003

Ile Phe Thr Ala Val Leu Phe Ala Ala Ser Ser Ala Leu Ala Ala Pro

10 15 20

gtc aac act aca aca gaa gat gaa acg gca caa att ccg gct gaa gct 1051

Val Asn Thr Thr Thr Glu Asp Glu Thr Ala Gln Ile Pro Ala Glu Ala

25 30 35

gtc atc ggt tac tca gat tta gaa ggg gat ttc gat gtt gct gtt ttg 1099

Val Ile Gly Tyr Ser Asp Leu Glu Gly Asp Phe Asp Val Ala Val Leu

40 45 50

cca ttt tcc aac agc aca aat aac ggg tta ttg ttt ata aat act act 1147

Pro Phe Ser Asn Ser Thr Asn Asn Gly Leu Leu Phe Ile Asn Thr Thr

60 65

att gcc agc att gct gct aaa gaa gaa ggg gta tct ctc gag aaa aga 1195

Ile Ala Ser Ile Ala Ala Lys Glu Glu Gly Val Ser Leu Glu Lys Arg

75 80 85

gag gct gaa gct gca gga ttg ttg gaa gga ttt ttg gtc gga ggt ggt 1243

Glu Ala Glu Ala Ala Gly Leu Leu Glu Gly Phe Leu Val Gly Gly Gly

90 95 100

gtc cct ggt cct ggt aca gca tgt ttg act aag gca ttg aaa gac agt 1291

Val Pro Gly Pro Gly Thr Ala Cys Leu Thr Lys Ala Leu Lys Asp Ser

105 110 115

gga gac ttg ttg gtt gag ttg gct gtt att att tgt gct tac caa aac 1339

Gly Asp Leu Leu Val Glu Leu Ala Val Ile Ile Cys Ala Tyr Gln Asn

120 125 130

ggt aaa gat ttg caa gag caa gat ttc aag gaa ttg aag gag ttg ttg 1387

Gly Lys Asp Leu Gln Glu Gln Asp Phe Lys Glu Leu Lys Glu Leu Leu

135 140 145

gaa aga act ttg gaa aga gct ggt tgt gct ttg gat gat att gtt gct 1435

Glu Arg Thr Leu Glu Arg Ala Gly Cys Ala Leu Asp Asp Ile Val Ala

150 155 160 165

gat ttg ggt ttg gaa gag ttg ttg ggt tct att ggt gtt tct act gga 1483

Asp Leu Gly Leu Glu Glu Leu Leu Gly Ser Ile Gly Val Ser Thr Gly

170 175 180

gat atc atc caa ggt ttg tac aag ttg ttg aag gag ttg aag atc gat 1531

Asp Ile Ile Gln Gly Leu Tyr Lys Leu Leu Lys Glu Leu Lys Ile Asp

185 190 195

gaa act gtt ttt aac gct gtt tgt gat gtt act aag aaa atg ttg gat 1579

Glu Thr Val Phe Asn Ala Val Cys Asp Val Thr Lys Lys Met Leu Asp

200 205 210

aac aag tgt ttg cca aag atc ttg caa gga gat ttg gtt aag ttc ttg 1627

Asn Lys Cys Leu Pro Lys Ile Leu Gln Gly Asp Leu Val Lys Phe Leu

215 220 225

gat ttg aag tac aag gtt tgt atc gaa ggt gga gat cca gaa ttg att 1675

Asp Leu Lys Tyr Lys Val Cys Ile Glu Gly Gly Asp Pro Glu Leu Ile

230 235 240 245

att aag gat ttg aag atc atc ttg gag aga ttg cct tgt gtt ttg ggt 1723

Ile Lys Asp Leu Lys Ile Ile Leu Glu Arg Leu Pro Cys Val Leu Gly

250 255 260

ggt gtt ggt ttg gat gat ttg ttt aaa aac atc ttc gtt aag gat ggt 1771

Gly Val Gly Leu Asp Asp Leu Phe Lys Asn Ile Phe Val Lys Asp Gly

265 270 275

att ttg tct ttc gaa ggt att gct aag cct ttg gga gat ttg ttg att 1819

Ile Leu Ser Phe Glu Gly Ile Ala Lys Pro Leu Gly Asp Leu Leu Ile

280 285 290

ttg gtt ttg tgt cct aat gtc aag aat atc aat gtt tca tca gag aac 1867

Leu Val Leu Cys Pro Asn Val Lys Asn Ile Asn Val Ser Ser Glu Asn

295 300 305

ctt tac ttt cag gga gcg gcc gcc agc ttt cta gaa caa aaa ctc atc 1915

Leu Tyr Phe Gln Gly Ala Ala Ala Ser Phe Leu Glu Gln Lys Leu Ile

310 315 320 325

tca gaa gag gat ctg aat agc gcc gtc gac cat cat cat cat cat cat 1963

Ser Glu Glu Asp Leu Asn Ser Ala Val Asp His His His His His His

330 335 340

tgagtttgta gccttagaca tgactgttcc tcagttcaag ttgggcactt acgagaagac 2023

cggtcttgct agattctaat caagaggatg tcagaatgcc atttgcctga gagatgcagg 2083

cttcattttt gatacttttt tatttgtaac ctatatagta taggattttt tttgtcattt 2143

tgtttcttct cgtacgagct tgctcctgat cagcctatct cgcagctgat gaatatcttg 2203

tggtaggggt ttgggaaaat cattcgagtt tgatgttttt cttggtattt cccactcctc 2263

ttcagagtac agaagattaa gtgagacctt cgtttgtgcg gatcccccac acaccatagc 2323

ttcaaaatgt ttctactcct tttttactct tccagatttt ctcggactcc gcgcatcgcc 2383

gtaccacttc aaaacaccca agcacagcat actaaatttt ccctctttct tcctctaggg 2443

tgtcgttaat tacccgtact aaaggtttgg aaaagaaaaa agagaccgcc tcgtttcttt 2503

ttcttcgtcg aaaaaggcaa taaaaatttt tatcacgttt ctttttcttg aaattttttt 2563

ttttagtttt tttctctttc agtgacctcc attgatattt aagttaataa acggtcttca 2623

atttctcaag tttcagtttc atttttcttg ttctattaca acttttttta cttcttgttc 2683

attagaaaga aagcatagca atctaatcta aggggcggtg ttgacaatta atcatcggca 2743

tagtatatcg gcatagtata atacgacaag gtgaggaact aaaccatggc caagttgacc 2803

agtgccgttc cggtgctcac cgcgcgcgac gtcgccggag cggtcgagtt ctggaccgac 2863

cggctcgggt tctcccggga cttcgtggag gacgacttcg ccggtgtggt ccgggacgac 2923

gtgaccctgt tcatcagcgc ggtccaggac caggtggtgc cggacaacac cctggcctgg 2983

gtgtgggtgc gcggcctgga cgagctgtac gccgagtggt cggaggtcgt gtccacgaac 3043

ttccgggacg cctccgggcc ggccatgacc gagatcggcg agcagccgtg ggggcgggag 3103

ttcgccctgc gcgacccggc cggcaactgc gtgcacttcg tggccgagga gcaggactga 3163

cacgtccgac ggcggcccac gggtcccagg cctcggagat ccgtccccct tttcctttgt 3223

cgatatcatg taattagtta tgtcacgctt acattcacgc cctcccccca catccgctct 3283

aaccgaaaag gaaggagtta gacaacctga agtctaggtc cctatttatt tttttatagt 3343

tatgttagta ttaagaacgt tatttatatt tcaaattttt cttttttttc tgtacagacg 3403

cgtgtacgca tgtaacatta tactgaaaac cttgcttgag aaggttttgg gacgctcgaa 3463

ggctttaatt tgcaagctgg agaccaacat gtgagcaaaa ggccagcaaa aggccaggaa 3523

ccgtaaaaag gccgcgttgc tggcgttttt ccataggctc cgcccccctg acgagcatca 3583

caaaaatcga cgctcaagtc agaggtggcg aaacccgaca ggactataaa gataccaggc 3643

gtttccccct ggaagctccc tcgtgcgctc tcctgttccg accctgccgc ttaccggata 3703

cctgtccgcc tttctccctt cgggaagcgt ggcgctttct caatgctcac gctgtaggta 3763

tctcagttcg gtgtaggtcg ttcgctccaa gctgggctgt gtgcacgaac cccccgttca 3823

gcccgaccgc tgcgccttat ccggtaacta tcgtcttgag tccaacccgg taagacacga 3883

cttatcgcca ctggcagcag ccactggtaa caggattagc agagcgaggt atgtaggcgg 3943

tgctacagag ttcttgaagt ggtggcctaa ctacggctac actagaagga cagtatttgg 4003

tatctgcgct ctgctgaagc cagttacctt cggaaaaaga gttggtagct cttgatccgg 4063

caaacaaacc accgctggta gcggtggttt ttttgtttgc aagcagcaga ttacgcgcag 4123

aaaaaaagga tctcaagaag atcctttgat cttttctacg gggtctgacg ctcagtggaa 4183

cgaaaactca cgttaaggga ttttggtcat gagatc 4219

<210>10

<211>341

<212> PRT

<213> Artificial Sequence

<220>

<223> amino acid sequence of the modified version of the Lv-Rsn-1 surfactant protein encoded by the nucleotide sequence SEQ ID NO:9, which in turn is contained in the SEQ ID NO:11.

<400>10

Met Arg Phe Pro Ser Ile Phe Thr Ala Val Leu Phe Ala Ala Ser Ser

5 10 15

Ala Leu Ala Ala Pro Val Asn Thr Thr Thr Glu Asp Glu Thr Ala Gln

20 25 30

Ile Pro Ala Glu Ala Val Ile Gly Tyr Ser Asp Leu Glu Gly Asp Phe

35 40 45

Asp Val Ala Val Leu Pro Phe Ser Asn Ser Thr Asn Asn Gly Leu Leu

55 60

Phe Ile Asn Thr Thr Ile Ala Ser Ile Ala Ala Lys Glu Glu Gly Val

70 75 80

Ser Leu Glu Lys Arg Glu Ala Glu Ala Ala Gly Leu Leu Glu Gly Phe

85 90 95

Leu Val Gly Gly Gly Val Pro Gly Pro Gly Thr Ala Cys Leu Thr Lys

100 105 110

Ala Leu Lys Asp Ser Gly Asp Leu Leu Val Glu Leu Ala Val Ile Ile

115 120 125

Cys Ala Tyr Gln Asn Gly Lys Asp Leu Gln Glu Gln Asp Phe Lys Glu

130 135 140

Leu Lys Glu Leu Leu Glu Arg Thr Leu Glu Arg Ala Gly Cys Ala Leu

145 150 155 160

Asp Asp Ile Val Ala Asp Leu Gly Leu Glu Glu Leu Leu Gly Ser Ile

165 170 175

Gly Val Ser Thr Gly Asp Ile Ile Gln Gly Leu Tyr Lys Leu Leu Lys

180 185 190

Glu Leu Lys Ile Asp Glu Thr Val Phe Asn Ala Val Cys Asp Val Thr

195 200 205

Lys Lys Met Leu Asp Asn Lys Cys Leu Pro Lys Ile Leu Gln Gly Asp

210 215 220

Leu Val Lys Phe Leu Asp Leu Lys Tyr Lys Val Cys Ile Glu Gly Gly

225 230 235 240

Asp Pro Glu Leu Ile Ile Lys Asp Leu Lys Ile Ile Leu Glu Arg Leu

245 250 255

Pro Cys Val Leu Gly Gly Val Gly Leu Asp Asp Leu Phe Lys Asn Ile

260 265 270

Phe Val Lys Asp Gly Ile Leu Ser Phe Glu Gly Ile Ala Lys Pro Leu

275 280 285

Gly Asp Leu Leu Ile Leu Val Leu Cys Pro Asn Val Lys Asn Ile Asn

290 295 300

Val Ser Ser Glu Asn Leu Tyr Phe Gln Gly Ala Ala Ala Ser Phe Leu

305 310 315 320

Glu Gln Lys Leu Ile Ser Glu Glu Asp Leu Asn Ser Ala Val Asp His

325 330 335

His His His His His

340 

1. A POLYNUCLEOTIDE characterized in that it encodes the predicted sequence of an isoform of the surfactant protein Lv-ranaspumin-1 (Lv-Rsn-1) and consisting of SEQ ID NO:2.
 2. A POLYNUCLEOTIDE characterized in that it encodes the predicted sequence of an isoform of the surfactant protein Lv-ranaspumin-1 (Lv-Rsn-1) and in that it has the codon frequency optimized for expression in bacteria, consisting of SEQ ID NO:3.
 3. A POLYNUCLEOTIDE characterized in that it encodes the predicted sequence of an isoform of the surfactant protein Lv-ranaspumin-1 (Lv-Rsn-1) and in that it has the codon frequency optimized for expression in yeast, consisting of SEQ ID NO:7.
 4. A POLYPEPTIDE characterized in that it is a modified version of an isoform of the surfactant protein Lv-ranaspumin-1 (Lv-Rsn-1) and consisting of SEQ ID NO:6.
 5. A POLYPEPTIDE characterized in that it is a modified version of an isoform of the surfactant protein Lv-ranaspumin-1 (Lv-Rsn-1) and consisting of SEQ ID NO:10.
 6. AN EXPRESSION CASSETTE, characterized in that it comprises a polynucleotide according to claim 2 operably linked to a promoter that directs expression in bacteria.
 7. AN EXPRESSION CASSETTE characterized in that it comprises a polynucleotide according to claim 3 operably linked to a promoter that directs expression in fungi, preferably in yeast.
 8. AN EXPRESSION VECTOR characterized in that it comprises an expression cassette according to claim
 6. 9. AN EXPRESSION AND TRANSFORMATION VECTOR characterized in that it comprises an expression cassette according to claim
 7. 10. A GENETICALLY MODIFIED MICRO-ORGANISM characterized in that it is a bacterium that produces a protein whose encoding sequence comprises the polynucleotide according to claim
 6. 11. A GENETICALLY MODIFIED MICRO-ORGANISM characterized in that it is a yeast that produces a protein whose encoding sequence comprises the polynucleotide according to claim
 7. 12. A PROCESS OF PRODUCTION OF GENETICALLY MODIFIED ORGANISM characterized in that it results in a bacterium according to claim 10 and comprises: a) transforming a bacterial strain with the expression cassette according to claim 6; b) selecting the transformed bacteria.
 13. A GENETICALLY MODIFIED ORGANISMS PRODUCTION PROCESS characterized in that it results in a yeast according to claim 11 and comprises: a) transforming a yeast strain with the expression cassette according to claim 7; b) selecting the transformed yeasts.
 14. A PRODUCT characterized in that it comprises a polypeptide according to claim
 4. 15. A PRODUCT characterized in that it comprises a polypeptide according to claim
 5. 16. AN ADVANCED OIL RECOVERY PROCESS AND IMPROVEMENT OF RESERVOIR FLUID DYNAMICS, using a biosurfactant protein obtained from a genetically modified organism, according to claims 1 to 15, characterized in that the genetically modified organism is capable of synthesizing the biosurfactant protein Lv-ranaspumin-1.
 17. AN OIL BIOREMEDIATION PROCESS, using a biosurfactant protein obtained from a genetically modified organism, according to claims 1 to 15, characterized in that the genetically modified organism is capable of synthesizing the biosurfactant protein Lv-ranaspumin-1.
 18. A TANK CLEANING PROCESS IN THE OIL AND GAS INDUSTRY, using a biosurfactant protein obtained from a genetically modified organism, according to claims 1 to 15, characterized in that the genetically modified organism is capable of synthesizing the biosurfactant protein Lv-ranaspumin-1. 